A Grammar of the Archaeological Record (Version 2, Beta release)

Categorization

Categorization principles

Giorgio Buccellati – August 2023, July 2024

FOLLOWING FROM S4-Constituent properties =======>

Labels, properties and codes

We have already described briefly (2.5) the distinction between these three concepts.

Labels serve as identifiers of constituents;
properties are the analytical traits which, together, define the same constituents; they subsume variables and variants;
codes are alphanumeric strings which refer to specific constituents or properties:
- codes for constituents have already been discussed above (S3), while
- codes for properties, i.e., for variables and for variants are discussed in this section.

For example, let us consider A15f4:

A15f4 is the constituent label;
the roster property consists of the variable definition "Definition" rendered by
- the mnemonic code: df or
- the structural code (alpha-numeric): B10, see Main Roster and also the detailed reference;
the lexical property consists of the variant definition "brickfall" rendered by
- the mnemonic code "bf".

Rosters: codes for variables

I use the term “roster” to refer to the structured list of codes used for variables. The main roster includes all the codes which are needed for stratigraphic analysis and for a minimal typological analysis. Special rosters provide additional information for distinct typologies (and occasionally for special types of stratigraphic analysis, such as microstratigraphy).

All rosters are identified by a prefix which consists of three characters the first is the capital letter Z which signals a roster code. The next two consist of any alphanumeric combination that identifies the roster in question: thus the code “Zmr” refers to the main roster, and the code “Zai” means that the secondary roster for Aglyptic Impressions on Sealings is being used.

Successive versions are possible for any given roster. The version is identified by a three character numeric code separated by a hyphen from the roster code. This version indicator must be placed at the beginning of each file. For example, “Zmr-022” identifies the current (22nd) version of the Main Roster, “Zsi-001” identifies the current (first) version of the Seal Impression roster.

The Main Roster is assumed by default. In other words, if no roster code is used, then the code for any given roster slot belongs to the Main Roster.

The structure of the roster is uniform in all cases. It consists of subgroups of variables, sorted according to their logical sequence. Within each subgroup, there are two parallel sets of codes, structural and mnemonic, and an explanation of the value of that particular property.

The structural code consists of a two or three character alphanumeric sequence. The first character is an UPPER CASE letter, which corresponds to the category within which the code belongs (e.g., volumetric localization). The second character is a digit or a lower case letter, which corresponds to the sequential number within the list of variables for that particular subgroup.

The mnemonic code is an optional variant of the structural code. It consists of two lower case letters, which echo the key word defining the variables: for example, “ht” or “lg”, for “height” and “length” respectively, are found within the subgroup of variables called “Measurements” of the Main Roster. The sequential code for this same subgroup “Measurements” is J; since height and length are the first two variables in the subgroup, the structural codes are J1 and J2, respectively.

Structural and mnemonic codes are wholly identical in their function. In the input, only one must be entered at a time, thus either “J1” or “ht”, either “J2” or “lg”, etc.

A list of all the used mnemonic codes can be found in chapter 6.6.

Special rosters (see below, 8) are identical in structure to the Main Roster. They can be developed as need arises, and internal codes are wholly independent among all rosters (Main and Special): thus “sp” means “specific label” within the Main Roster, but it means “spin” within the “Secondary Roster Zai” (Aglyptic Impressions on Sealings).

A roster will also indicate whether the particular set of variants expected in any given slot needs to use a lexicon or a standard, or else whether the variant can be in free format. For instance, in the Main Roster, the variable B10 = df (“definition”) uses a lexicon, the variable B11 = ds (“description”) is in free format, and the variable B20 = qc requires a numeric standard.

Lexica: codes for variants

A lexicon lists the codes for variants used to match given variables, i.e., to fill a pertinent roster slot.

A few of these lexica are in standard use, e.g., the “Munsell Color Chart” is in fact a lexicon, from which values are taken to fill the slot for color K5 = co (“color”). Such lexica are obviously presupposed by the Roster, and need not be given in this grammar.

The other lexica are given below (chapters 7 and 9). But it is important to discuss first the range of applicability of the very concept of lexical definition. For it is a moot question whether precise lexical definitions ought to be given for every variant or not. In some cases, lexical definitions may be considered obvious, and no particular explanation may seem particularly necessary – a cylinder seal or a cuneiform tablet may possibly be so considered. But as soon as one probes a little deeper, alternatives emerge. For instance, one may define a color impressionistically as “reddish”, or technically as “reddish gray” (= Munsell value 10R 5/1): the latter is derived from a lexicon in the technical sense used here, while the former is derived from common use English.

One important consideration is to combine precision with efficiency. The choice is conditioned by specific field situations, which will recommend different strategies. Means are available, if strategy requires it, to match accuracy with precision; in other words, a precise lexicon may be used if the specific situation warrants spending the extra time that may needed to obtain it. Precision depends on the availability of the lexicon, accuracy on the decision to use this lexicon.

Standards: criteria for variants

The reason for a possible conflict between precision and accuracy rests in the fact that, in some cases, to obtain precision one must employ standards, and this requires an extra amount of time. For instance, the standard for measurements are centimeters: to state that a distance is “great” may be considered equivalent to saying that its length is “325 cms.” Two distinct lexica are used, but a standard is needed for the second – the difference being, of course, that more time is required to measure the distance with a tape (the standard) and obtain thereby the more precise lexical definition.

A standard, then, is a tool that allows us to apply certain parameters in order to obtain a more precise lexical definition. Thus a Munsell Color Chart is both a lexicon (in that it provides a reasoned sequence of discrete definitions) and a standard (in that it contains means of matching the definition against a measurable example).

It may be assumed as normal that in most situations one will use a tape and give a precise measure in centimeters, hence it is less frequent that measurements be given with adjectives rather than digits. On the other hand, it may not be expected in quite the same way that one should use a Munsell Color Chart each time one refers to a color: the adjective “reddish” for color (instead of the value “10R 5/1”) may be used more readily than the adjective “great” for distance (instead of “325 cms.”).

A centimeter tape is a readily available standard, a Munsell Color Chart slightly less so, if nothing else because of the moderate costs involved and the greater time expenditure required. Yet in other cases, standards are not readily available, even when cost is not a factor. What is often missing, in other words, is simply the methodological framework within which to identify standards. For example, one would normally say “hard” without using a penetrometer, or “clayish” without referring to explicit criteria of identification, or “oblique” without stating the degree of axial inclination.

Note that one needs to aim for maximal precision at all times. That is precisely what I meant by referring to the question of strategy. It should be a deliberate choice to opt for greater or lesser precision, and such a choice ought to be faced consciously in the first place, and then made explicit in the record. But a problem that is not, in my opinion, adequately addressed is that in many cases standards are missing even when one would, strategically, want to opt for the use of one.

Some such standards are introduced here, and are presented below (chapter 20). They envisage a variety of areas, with particular emphasis on stratigraphy. Others may of course be introduced at any time, and, through the definition of a proper matching lexicon, can be introduced within the overall system.

to here <========

The basic concepts

The categorization system is the formal means for ensuring coherence in the data. As such, it serves both

a practical purpose, in that it provides a uniform guideline for data entry,
and a theoretical purpose, in that it provides a structural definition of the constituents, aiming to define the substance of each constituent.

The categorization system rests on the roster and the lexicon as the major elements of the system. They are defined in what follows, with regard to (1) the practical and (2) the theoretical aspects.

Roster

(1) The roster is an organic series of slots that are to be filled with individually defined content. They thus serve as variables, meaning that they can vary according to parameters given in the lexicon. Thus a slot that requires a definition may be filled with a code that defines a constituent, e.g., the shape of a ceramic vessel as a jar.

(2) Theoretically, the roster is a paradigm in the form of a structured set of categories. These are organically related to each other, so as to give the full range of possible characteristics of the constituent in question. Thus the categories for a ceramic vessel would include, besides shape, also ware, decoration, color, measurements, function and time assignment: together, they “categorize,” fully, the constituent in question.

Categories may be either parallel or nested. For an example of the latter see the shapes in the Ceramic special roster.

Lexicon

(1) The lexicon is a set of variants that can be applied to a given variable understood as a slot of the roster. Thus in the slot for definition, a ceramic vessel may in turn be defined as a jar, a bowl, a cup, etc.

(2) Theoretically, the lexical entries are seen as attributes. They are analytical components that match specific criteria, down to the most minute detail. Thus the lexical code jn.h109 defines a “short necked jars: neck slightly flared outward, restricted slightly at base of neck, neck and rim much smaller diameter than widest part of body.”

Standards

Standards are sets of recognized and established criteria that serve as a rule for measurable attributes of given elements. Thus, metrical standards are standards for size (in centimeters) or weight (in grams), and a Munsell Chart code is a standard used instead of a generic lexical term such as “red”.

Codes

The main lexicon and the special lexica are lists of coded entries (e.g, “fa” for “floor of type a”). The use of codes is important to facilitate standardizaion, and thus the creation of indices and statictics.

Where no lexicon is available, entries are given in free format.

Closed and open systems

The roster is a closed system. This means that, being an organic whole, any addition to it must take into account all the other elements of the system.

It also means that there is a hierarchy in the sequence of the codes: thus the codes of ceramic items in the “family” category are valid only if seen within the higher level “main” category, and so forth (see the Ceramics digital book).

The lexicon is an open system. This means that additions can be made at will.

The -emic and -etic dimensions

The roster reflects essentially an -emic system (on the concepts of “-emic” and “-etic” in archaeology, see e.g. Buccellati, G. 2006.) This means that it is assumed that the categories reflect a native understanding of the item in question. It further means that they are inventory specific: the concept of “jar” makes sense in relationship to other terms like bowl or cup. For the higher categories, there are in fact linguistic terms that are applicable, such as kāsu in Akkadian for a cup.

The degree to which this applies varies. For instance, in the example just given above, the variation could be considered as a lexical rather than a roster element. In other words, it seems less likely that such a minor variation may have been a matter of conscious identification on the part of the users. Even the sub-family reflects a category that may have been perceived as distinct at best by specialists rather than by normal users.

The lexicon, on the other hand, reflects an -etic dimension. This means that the criteria used to define “lexically” any given item are not inventory specific, they are drawn from systems of measuremenst or the like that would not have been familiar to the original users.

As an example, we may safely assume that the original users would distinguish colors and would be sensitive to their perception in a way that is essentially the same as ours. On the other, defining the color of a given ware as 10YR8/2 (according to the Munsell scale) is extrinsic to the ceramic inventory as such (the Munsell scale was not created for the Urkesh corpus) and measuring color in this way would be alien to Urkesh potters and users.

To say that an -etic dimension is “extrinsic” to the original inventory and alien to the native sensitivity of makers and users does not of course mean that it is invalid. Quite on the contrary. We must only be aware that they are different levels of analysis, that should not be mixed. It is the same with phonemics and phonetics, where the latter may use complex acoustic measuring system with parameters that define individual sounds – except that in the case of ancient languages we do not have native speakers, whereas in the case pf material culture we do have the original artefacts.

Type

A type is defined either by a single attribute or a cluster of attributes within a given (roster) category.

There is thus a hierarchy of types depending on the complexity of the lexical dimension, a complexity which is reflected in the code used to define the type: defining a ceramic vessel simply as a “jar” is at a higher level than defining it as a “necked jar,” and so forth.

The following chart shows how a type is defined, and gives at the same time a synopsis of the criteria described above to define the roster and the lexicon.

ROSTER		LEXICON
variables (closed, -emic)		variants (open, -etic)
category	code	code	attributes
generic	df	cv	ceramic vessel
main	ZcaS	j	jar
family	ZcaS1	n.	necked
sub-family	ZcaS2	h	short necked
variations	ZcaS3	109	slightly flared outward,restricted slightly at base of neck,neck and rim much smaller diameter than widest part of body

j

jn.

jn.h

jn.h109

TYPES

Here is an example of type jn.h,
a short necked jar (A8q122.4),
without the variation indicated by code 109.

Sequence and missing attributes

Roster slots are organized in a strict sequential order: this means that each slot depends for its meaning on the preceding one(s). Thus, in the example just given above, the attribute “n.” is valid only if there is an attribute “j” that precedes it immediately.

If an attribute is missing, it is indicated by the minus sign (“-“). An example of this is found in the case of a ceramic detail that is applies only to a particular detail of the shape (say, a rim), when, however, the main shape (say a bowl or a jar) is missing.

In such a case, the first four roster all have "-" as attributes, and there will be an attribute only in fifth position. Thus ----f stands for a particular type of rim ("flat") as in J1q651-1.